Extracting audio components of a portion of video to facilitate editing audio of the video

ABSTRACT

Systems and methods for extracting audio components of a portion of a video to facilitate editing the audio portion are presented. In one or more aspects, a system is provided that includes a receiving component configured to receive a video as an upload from a client device over a network and an identification component configured to identify two or more different audio components of an audio track of the video. The system further comprises an extraction component configured to extract and separate the two or more different audio components, and an editing component configured to generate an editing interface that receives input via the editing interface regarding editing the two or more different audio components separately.

TECHNICAL FIELD

This application generally relates to systems and methods for extractingaudio components of a portion of video to facilitate editing audio ofthe video.

BACKGROUND

Many mobile devices allow users to capture video and share capturedvideos with others through media publishing websites. Absentsophisticated video editing tools or videographer editing expertise,video content uploaded to media publishing websites by ordinary users isoften incomplete and of amateur quality.

BRIEF DESCRIPTION OF THE DRAWINGS

Numerous aspects, embodiments, objects and advantages of the presentinvention will be apparent upon consideration of the following detaileddescription, taken in conjunction with the accompanying drawings, inwhich like reference characters refer to like parts throughout, and inwhich:

FIG. 1 illustrates an example system for editing an audio track of avideo file in accordance with various aspects and embodiments describedherein;

FIG. 2 presents an example process for extracting an audio track andvideo track from a video file in accordance with various aspects andembodiments described herein;

FIG. 3 presents an example user interface for editing audio componentsof an audio track jointly or separately in accordance with variousaspects and embodiments described herein;

FIG. 4 presents another example user interface for editing audiocomponents of an audio track jointly or separately in accordance withvarious aspects and embodiments described herein;

FIG. 5 illustrates another example system for editing an audio track ofa video file in accordance with various aspects and embodimentsdescribed herein;

FIG. 6 illustrates another example system for editing an audio track ofa video file in accordance with various aspects and embodimentsdescribed herein;

FIG. 7 illustrates another example system for editing an audio track ofa video file in accordance with various aspects and embodimentsdescribed herein;

FIG. 8 presents an example process for combining an edited audio trackwith an associated video track to generate an edited video file inaccordance with various aspects and embodiments described herein;

FIG. 9 illustrates another example system for editing an audio track ofa video file in accordance with various aspects and embodimentsdescribed herein;

FIG. 10 illustrates another example system for editing an audio track ofa video file in accordance with various aspects and embodimentsdescribed herein;

FIG. 11 is a flow diagram of an example method for editing an audiotrack of a video file in accordance with various aspects and embodimentsdescribed herein;

FIG. 12 is a flow diagram of another example method for editing an audiotrack of a video file in accordance with various aspects and embodimentsdescribed herein;

FIG. 13 is a flow diagram of another example method for editing an audiotrack of a video file in accordance with various aspects and embodimentsdescribed herein;

FIG. 14 is a schematic block diagram illustrating a suitable operatingenvironment in accordance with various aspects and embodiments.

FIG. 15 is a schematic block diagram of a sample-computing environmentin accordance with various aspects and embodiments.

DETAILED DESCRIPTION

The innovation is described with reference to the drawings, wherein likereference numerals are used to refer to like elements throughout. In thefollowing description, for purposes of explanation, numerous specificdetails are set forth in order to provide a thorough understanding ofthis innovation. It may be evident, however, that the innovation can bepracticed without these specific details. In other instances, well-knownstructures and components are shown in block diagram form in order tofacilitate describing the innovation.

By way of introduction, the subject matter described in this disclosurerelates to systems and methods for editing video using an applicationrunning at a server accessible to a client device (e.g., via the cloud).For example, a user can record a video on a client device and upload thevideo to a networked media sharing system accessible to the clientdevice via a network (e.g., the Internet). Once the video is uploaded,the user can employ video editing tools provided at the networked mediasharing system to edit the video. In particular, the disclosed systemsand methods offer video editing tools that facilitate editing an audiotrack associated with a video (e.g., the audio portion of the video asopposed to the visual image portion of the video).

For example, an editing system running or employed by the networkedmedia sharing system separates video, comprising video images and audio,into respective video and audio tracks. The editing system parses theaudio track(s) and identifies two or more different audio components orlayers present in the audio track(s).

For example, different audio components or layers of an audio track caninclude a voice of a first person, a voice of a second person, a soundof a train, a sound of a siren, music, muffled sounds of a crowd, etc.The editing system provides editing tools to apply to the differentaudio components jointly or separately. For example, the editing systemcan allow a user to mute one of the audio components and increase volumeof another audio component. After editing different audio components,the editing system can re-join different audio components and generatean edited audio track that reflects edits applied to the different audiocomponents. The system can re-join the edited audio track with theoriginal video track to produce an edited video.

In one or more aspects, a system is provided that includes a receivingcomponent configured to receive video as an upload from a client deviceover a network and an extraction component configured to separate anaudio track and a video track from the video. The system furtherincludes an identification component configured to identify two or moredifferent audio components of the audio track, wherein the extractioncomponent is further configured to separate the two or more differentaudio components; and an editing component configured to generate anediting interface and provide the client device one or more options forediting the two or more different audio components jointly or separatelyvia the editing interface.

In another aspect, a method is disclosed that includes receiving a videoas an upload from a client device over a network, separating an audiotrack and a video track from the video, identifying two or moredifferent audio components of the audio track, separating the two ormore different audio components, generating an editing interface, andproviding the client device one or more options for editing the two ormore different audio components jointly or separately via the editinginterface.

Further provided is a tangible computer-readable storage mediumcomprising computer-readable instructions that, in response toexecution, cause a computing system to perform various operations. Theseoperations can include receiving a video as an upload from a clientdevice over a network, separating an audio track and a video track fromthe video, identifying two or more different audio components of theaudio track, and separating the two or more different audio components.The operations further include generating an editing interface,providing the client device one or more editing options for editing thetwo or more different audio components via the editing interface,receiving a request to apply an editing option of the one or moreediting options to only a first one of the two or more different audiocomponents, and applying the editing option to only the first one of thetwo or more different audio components in response to the request.

Referring now to the drawings, with reference initially to FIG. 1,presented is a diagram of an example system 100 that facilitates editingan audio portion of a video in accordance with various aspects andembodiments described herein. Aspects of systems, apparatuses orprocesses explained in this disclosure can constitute machine-executablecomponents embodied within machine(s), e.g., embodied in one or morecomputer readable mediums (or media) associated with one or moremachines. Such components, when executed by the one or more machines,e.g., computer(s), computing device(s), virtual machine(s), etc. cancause the machine(s) to perform the operations described.

System 100 includes media editing system 104 and one or more clientdevices 120 configured to connect to media editing system 104 via one ormore networks 118. Media editing system 104 is configured to receive amedia item from a client device 120 via a network 118 and facilitateediting the media item. For example, a client device 120 can upload avideo to media editing system 104 over a network and employ the mediaediting system 104 to edit the video in manners discussed herein. Themedia editing system 104 can provide the edited video back to the clientdevice 120 for usage thereof, send the edited video to another devicevia a network 118, post the edited video at a networked resource, storethe edited video, etc.

In an aspect, media editing system 104 can be used in association with amedia sharing system 102. According to this aspect, media sharing system102 can include media editing system 104 or access media editing systemvia a network 118. Media sharing system 102 can include an entityconfigured to receive media content from one or more client devices 120via a network 118 and provide the media content to one or more clientsvia network 118. In an aspect, media sharing system 102 employs mediaediting system 104 to provide tools for editing media content uploadedto media sharing system 102.

As used herein the term media content or media item can include but isnot limited to streamable media (e.g., video, live video, videoadvertisements, music videos, audio, music, sound files and etc.) andstatic media (e.g., pictures, thumbnails). In an aspect, media sharingsystem 102 can employ one or more server computing devices to store anddeliver media content to users of client devices 120 that can beaccessed using a browser. For example, media sharing system 102 canprovide and present media content to a user via a website.

In an aspect, media sharing system 102 is configured to provide streamedmedia to users over network 118. The media can be stored in memory (notshown) associated with media sharing system 102 and/or at variousservers employed by media sharing system 102 and accessed by a clientdevice 120 using a website platform of the media sharing system 102. Forexample, media sharing system 102 can include a media presentationsource that provides client device 120 access to a voluminous quantity(and potentially an inexhaustible number) of shared media (e.g., videoand/or audio) files. The media presentation source can further streamthese media files to one or more users at respective client devices 120of the one or more users over one or more networks 118. In anotheraspect, media sharing system 102 is configured to receive media filesfrom one or more client devices 120 via one or more networks 118. Forexample, client device 120 can upload a video to media sharing system102 via the Internet for sharing with other users using the mediasharing system 102. Videos received by media sharing system 102 canfurther be stored in memory (not shown) employed by the media sharingsystem 102.

Client device 120 can include any suitable computing device associatedwith a user and configured to interact with media sharing system 102 andmedia editing system 104. For example, client device 120 can include adesktop computer, a laptop computer, a television, a mobile phone, asmart-phone, a tablet personal computer (PC), or a personal digitalassistant PDA. As used in this disclosure, the terms “content consumer”or “user” refer to a person, entity, system, or combination thereof thatemploys system 100 (or additional systems described in this disclosure)using a client device 120. Network(s) 118 can include wired and wirelessnetworks, including but not limited to, a wide area network (WAD, e.g.,the Internet), a cellular network, a local area network (LAN), or apersonal area network (PAN). For example, client device 120 can provideand/or receive media to/from media sharing system 102 or media editingsystem 104 (and vice versa) using virtually any desired wired orwireless technology, including, for example, cellular, WAN, wirelessfidelity (Wi-Fi), Wi-Max, WLAN, and etc. In an aspect, one or morecomponents of system 100 are configured to interact via disparatenetworks.

Media editing system 104 is configured to offer editing tools forediting a media item that at least includes an audio track. In anaspect, media editing system 104 is configured to edit a video that hasboth an audio portion or audio track and a video image portion or videotrack—the terms portion and track are used herein interchangeably.According to this aspect, the media editing system 104 is configured toseparate the audio track and video track of a video file and providevarious tools for editing the audio track. The media editing system 104can later join the edited audio track with the video track previouslyseparated there from to create a version of the video with the editedaudio track. In another aspect, media editing system 104 can provideediting tools for an audio track that is not associated with a video orvideo track.

Media editing system 104 can include receiving component 106 forreceiving video and audio files, extraction component 108 for extractingan audio track from a video file and for extracting audio componentsfrom an audio track, identification component 110 for identifyingdifferent audio components of an audio track and editing component 112for editing the different components of the audio track. Media editingsystem 104 includes memory 116 for storing computer executablecomponents and instructions. Media editing system 104 can furtherinclude a processor 114 to facilitate operation of the instructions(e.g., computer executable components and instructions) by media editingsystem 104.

Receiving component 106 is configured to receive a video or audio tracktransmitted to media editing system 104 or media sharing system 102 froma client device via a network 118. For example, a client device 120 canupload or otherwise send a video or audio track to media editing system104 via a network 118 for editing thereof. According to this example,the video or audio track is intercepted by receiving component 106. Inanother example, where media sharing system 102 includes media editingsystem 104, a client device 120 can upload or otherwise send a video oraudio track to media sharing system 102. In an aspect, a user of clientdevice 120 can further choose to edit the uploaded video or audio track.The user can then access the video at the media sharing system 102 andselect the video for editing (e.g., via a user interface generated bythe media sharing system 102 or media editing system 104). Selection ofthe video for editing can result in sending of the video by mediasharing system 102 to the media editing system 104 for editing thereof.In another aspect, the media sharing system 102 can automatically send avideo or audio track uploaded thereto to media editing system 104 forediting. According to this aspect, videos or audio tracks sent to mediaediting system by media sharing system 102 are received by receivingcomponent 106.

In an aspect, extraction component 108 is configured to separate anaudio track and video track from a video file received by media editingsystem 104. Referring ahead to FIG. 2, presented is a diagramdemonstrating an example extraction process 200 of extraction component108. FIG. 2 includes a first bar 202 representing a video file, a secondbar 204 representing the extracted video portion of the video file and athird bar 206 representing the extracted audio portion of the videofile. The video file and its extracted parts are depicted separated intofour segments 216, 218, 220, 222. In an aspect, the segments areassociated with frames of the video. It should be appreciated thatalthough the video file represented by bar 202 is depicted as havingfour segments or frames, a video file received by media editing system104 can include any suitable number N of segments or frames (N is aninteger). Still in other aspects, a video file received and processed bymedia editing system 104 can be organized and displayed as a singlesegment or frame.

The video file represented by bar 202 includes an audio portion/audiotrack and a video portion/video track. The video portion of the videofile is represented by the diagonal patterned lines of bar 202. Theaudio portion is collectively represented by the four different lines208, 210, 212, and 214 spanning across the segments of bar 202. The fourdifferent lines 208, 210, 212, and 214 represent different audiocomponents or layers of the audio portion of the video file. Forexample, line 208 can represent dialogue between actors in the video,line 210 can represent muffled background noise occurring in the video,line 212 can represent a song playing in the video, and line 214 canrepresent clapping and cheering of an audience during the video.

The extraction component 108 is configured to separate the video portionand audio portion of a video file. In particular, as seen in FIG. 2, theextraction component can separate the video file represented by 202 intobar 204 and bar 206. Bar 204 represents the video portion of video fileseparated from the audio portion and bar 206 represents the audioportion of video file separated from the video portion.

Referring back to FIG. 1, in another aspect, extraction component 108 isconfigured to extract different audio components or layers of an audiotrack. (The terms audio component and audio layer are used hereininterchangeably). For example, in addition to extracting an audio trackfrom a video file, the extraction component 108 can extract differentidentified audio components of the audio track. In another example, whenthe receiving component 106 receives an audio file, the extractioncomponent 108 can extract different identified audio components from theaudio file. As discussed infra, these different audio components of anaudio track or audio file can be identified by identification component110.

The term audio component or audio layer refers to a distinct soundpresent in an audio track. For example, as exemplified above withrespect to FIG. 5, an audio track could include several audio componentsor distinct sounds such as dialogue between actors, muffled backgroundnoise, a song, and clapping and cheering of an audience. In an aspect, asound is considered distinct as a function of the source of the sound.For example, different sources can provide different sounds (e.g.,different people, different groups of people, different animals,different objects, different instruments, different inputs of sound,etc.). In another aspect, a sound present in an audio track can beconsidered distinct based on various features including by not limitedto type, intensity, pitch, tone, and harmony. Still in other aspects, asound can be considered distinct as a function of words spoken andlanguage employed to create the sound.

The identification component 110 is configured to identify differentaudio components present in an audio track. For example, with referenceto FIG. 2, bar 206 representative of the audio track is depicted withfour different audio components represented by lines 208, 210, 212 and214. The identification component 110 is configured to identify thesedifferent audio components so that they can be extracted by extractioncomponent 108. In one aspect the identification component 110 isconfigured to identify different audio component merely as distinctsounds present in the audio track. In another aspect, the identificationcomponent 110 can determine or infer what the distinct sounds are. Forexample, with reference to FIG. 2, in one aspect, the identificationcomponent 110 can identify four different audio components present inthe audio track, the components represented by lines 208, 210, 212 and214. In another aspect, the identification component 110 can furtherdetermine that the audio component represented by line 208 is dialoguebetween actors, the audio component represented by line 210 is muffledbackground noise, the audio component represented by line 212 is a song,and the audio component represented by line 214 is clapping and cheeringof an audience. In one or more additional aspects, the identificationcomponent 110 can also identify and note features of the audiocomponents such as intensity, volume, tone, pitch, harmony, etc.

The identification component 110 can employ various mechanisms toidentify different audio components (and characteristics of thedifferent audio components) present in an audio track. In an aspect,identification component 110 can analyze frequency patterns generated bythe various sounds present in an audio track to identify distinguishablepatterns. The identification component 110 can then classify eachdistinguishable pattern as a different audio component. For example, theidentification component 110 can distinguish between different frequencybands based on different oscillation patterns associated with thefrequency bands to identify different audio components of an audiotrack.

In another aspect, identification component 110 can compare knownfrequency patterns stored in memory 116 to frequency patterns present inan audio track. The identification component 110 can further determinewhether a frequency pattern present in the audio track represents adistinct audio component and/or what the distinct audio component isbased in a degree of similarity between the known frequency pattern andthe frequency pattern present in the audio track. For example, memory116 can store a look-up table having various known frequency patternsrespectively representative of known sounds. According to this example,a frequency pattern identified as pattern #124 could represent afrequency pattern for an ambulance siren. The identification component110 can match a frequency pattern in an audio track to that of pattern#124 and determine that the frequency pattern in the audio track is anambulance siren. The identification component 110 can then classify thefrequency pattern present in the audio track as a distinguishable audiocomponent and note that the audio component is an ambulance siren.

In another aspect, where an audio component includes spoken language,identification component 110 can employ voice to text recognitionsoftware to convert the spoken language into a text file to identify theaudio component. The identification component 110 can analyze the textfile to further identify what the audio component is and the source ofthe audio component (e.g., what person is speaking). In an aspect,identification component 110 can match a text file of an audio componentto a known reference text file to facilitate identifying the audiocomponent. For example, identification component 110 can access a textreference file for a speech spoken by the President to identify a voiceto text interpretation of an audio component as the same speech.

In yet another aspect, identification component 110 can analyze tags ormetadata associated with an audio track to facilitate identifyingdifferent audio components present in the audio track. For example, avideo file can be received by media editing system 104 with annotationsor tags identifying one or more audio components of the audio portion ofthe video file. Similarly, an audio file comprising an audio track canbe received by media editing system 104 with annotations or tagsidentifying one or more audio components of the audio track. Theidentification component 110 can further employ the tags or annotationsassociated with an audio track to easily identify the differentcomponents of the audio track.

According to this aspect, when a video file or audio track is recordedat a client device 120, hardware (e.g., different microphones) orsoftware associated with the client device 120 can distinguish betweendifferent sounds being received. The client device 120 can includesoftware that then annotates the different sounds with metadata. In someaspects, the annotations can merely identify different components orsounds in the audio track. In other aspects, the annotations cancharacterize the type or source of the sound (e.g., the annotations canindicate a particular sound is a person speaking or a dog barking). Itshould be appreciated that degree and specificity of annotations of anaudio track will vary based on technical sophistication of hardwareand/or software employed by the client device 120.

For example, client device 120 may recognize dominant sounds in an audiorecording and characterize those sounds while grouping extraneous soundsinto a classification as background noise. According to this example, avideo recording of a polo match may include a plurality ofdistinguishable sounds, such as the sound of a chatting crowd, the soundof the match announcer, the sound of running horses, the sound of theplayers grunting and calling out plays, the sound of mallets hitting thechucker, the sound of cars coming and going, the sound of the game horn,etc. As the video is being recorded (or after the video has beenrecorded) the client device 120 can recognize and annotate the dominatesounds (e.g., the match announcer, the sound of running horses, thesound of the game horn) while grouping and annotating (or tagging) thenon-dominate sounds as background noise.

In an aspect, identification component 110 can also identify videoframes or segments of a received video file and the respective segmentsof the audio track for the video respectively associated with each videoframe. For example, with reference to FIG. 2, the video file representedby bar 202 includes four frames/segments 216, 218, 220, and 222. Theidentification component 110 can identify these frames of video andfurther identify different audio components associated with each frame.For example, bar 206 representative of the audio track for the videofile is also broken into segments corresponding to the videoframes/segments 216, 218, 220 and 222, respectively. According to thisaspect, the audio portion associated with different frames of a videocan include different audio components. A user can further edit an audiotrack associated with a video on a frame by frame basis in addition toan audio component by audio component basis.

Editing component 112 is configured to provide editing tools for editingan audio track received by media editing system 104. In particular,editing component 112 facilitates editing different components of anaudio track identified by identification component 110 and extracted byextraction component 108. For example, a user may film a video having avariety of occurring sounds or audio components. The user may desire toedit the audio portion of the video to effect different changes withrespect to different sounds or audio components of the video occurringat different times of the video. Editing component 112 provides thetools that enable the user to accomplish this task.

With reference to the above example video of a polo match, the video mayinclude the sound of a chatting crowd, the sound of the match announcer,the sound of running horses, the sound of the players grunting andcalling out plays, etc. A user may want to decrease volume of the matchannouncer in frames identified as frames 128 and 129 and increase volumeof sound of horses running at frames 128, 129 and 160. In anotherexample, the user may want to remove sound of a chatting crowdaltogether.

In an aspect, editing component 112 is configured to generate an editinginterface that allows a user of a client device 120 to edit the audioportion of a video file. The editing interface can provide editingtools, including tools for separating an audio track from a video file,tools for applying various editing options to audio components of theaudio track and tools for re-joining an edited audio track with thevideo track of the video file.

FIGS. 3 and 4 depict example editing interfaces generated by editingcomponent 112. With reference to FIG. 3, presented is an editinginterface 300 that displays components of an audio track of a video filein a layered view. FIG. 6 is an extension of FIG. 3. In FIG. 3, theextracted audio track for a video file is represented by bar 206. Theextracted audio track for the video file includes four different audiocomponents represented by lines 208, 210, 212 and 214 identified byidentification component 110. As noted above, the extraction component108 is configured to extract these different audio components. Theediting component 112 can then present the extracted audio componentsvia editing interface 300.

Editing component 112 can separate each of the different audiocomponents represented by lines 208, 210, 214 and 212 in differentlayers. Editing component 112 can also segment each of the differentaudio components by frame (e.g., frame 216, frame 218, frame 220 andframe 222). Although audio components and frames are identified inediting interface 300 by respective numbers, editing component 112 canapply various different titles to identify items of the interface. Forexample, where identification component 110 identifies an audiocomponent for what it is, (e.g., a siren, a song, actor John Smith,etc.), editing component 112 can place a title next to the audiocomponent indicating what it is.

Editing component 112 enables a user to edit different components of anaudio track jointly or separately. In other words, editing component 112allows a user to edit an audio track by effecting editing changes to theaudio track as a whole or in a piecemeal manner. In particular, editingcomponent 112 allows a user to select a specific audio component forediting, a specific segment of audio associated with a frame of videofor editing, and/or a specific audio component associated with aspecific frame for editing. For example, with reference to interface300, a user could select audio component represented by line 210 andapply editing tools to the entire audio component represented by line210 (e.g., select the row for component represented by line 210 andapply editing tools to the entire row). In another example, a user canselect frame 216 and apply editing tools to each of the audio segmentsassociated with frame 216 (e.g., select the column for frame 216 andapply editing tools to the entire column). In another example, a usercould select audio component represented by line 210 at frame 216 forediting individually (e.g., select the cell at component for line 210,frame 216). Still in other aspects, a user could select two or more ofthe audio components and/or two or more of the frames for editingjointly. It should be appreciated that various additional combinationsof cell selection, row selection and/or column selection associated withinterface 600 can be afforded by editing component 112.

Editing component 112 can provide various editing tools or options forediting one or more components of an audio track jointly or separately.For example, as seen in interface 300, the editing component 112provides an option to mute audio, adjust volume, adjust pitch, adjustspeed, adjust tone, equalize, adjust echo, replace audio and add/removeaudio (e.g., add a sound affect, remove a sound effect, add asoundtrack, etc). It should be appreciated that the above noted editingtools are merely exemplary. Editing component 112 can be configured toprovide various additional known or later developed audio editing tools.For example, additional editing tools that can be applied to one or morecomponents of an audio track by editing component 112 can include anoption to change a spoken language or an option to anonymise spokenlanguage of a particular person. In aspect, when editing an audio trackvia an interface generated by editing component (e.g., using interface300), a user can select an one or more audio components to edit and thenselect one or more editing tools (e.g., one or more editing options 302of interface 300) to apply to the one or more audio components. Theediting component 112 is further configured to apply editing changes tothe audio track. For example, with respect to interface 300, a user canselect one or more audio components to edit and one or more editingoptions 302 to apply. The user can the select the apply button 604 toeffectuated the changes.

FIG. 4 demonstrates an editing interface 400 after application of one ormore editing tools to an audio track. FIG. 4 is an extension of FIG. 3.As seen in FIG. 4, various changes have been implemented to the audiotrack for the associated video file. In particular, the audio componentrepresented by line 210 has been removed or muted. The audio componentrepresented by line 212 has been removed or muted at frames 216 and 220.Further, speed of the audio component represented by line 214 has beenreduced, and the volume of audio component represented by line 214 hasbeen increased as exemplified by thickening and lengthening of line 214.

It should be appreciated that an editing interface generated by editingcomponent (e.g., interface 300 and 400) can provide various other toolsthat facilitate editing audio in accordance with known editing audiosoftware. For example, editing interfaces 300 and 400 can allow a userto playback an edited audio track to listen to changes, adjust changes,and/or apply additional changes to the audio track.

In some aspects, an audio track may include a large number of identifiedaudio components. However, presenting a user with different audiocomponents may be undesired or overwhelming. For example, where aportion of an audio track includes 15+ identified audio components, auser may not desire to listen to each component to identify specificcomponents the user is interested in editing (e.g., where the componentsare not identified for what they are but merely identifies asseparate/distinct sound). Accordingly, in an aspect, editing component112 and/or identification component 110 can be configured todiscriminate between audio components and select a subset of a pluralityof audio components to present to a user for editing. For example,identification component 110 can be configured to select the top fivemost dominate/distinct sounds for editing. In another example,identification component 110 can be configured to identify extraneoussounds or background noise in an audio track for editing (e.g., formuting). In yet another example, an inference can be made as to a subsetof sounds that a user would be interested in editing (e.g., based oncontext, preferences, historical information . . . ).

Referencing FIG. 5, presented is a diagram of another example system 500that facilitates editing an audio portion of a video, in accordance withvarious aspects and embodiments described herein. System 500 includessame features and functionalities of system 100 with the addition ofautomatic enhancement component 502 and inference component 504.Repetitive description of like elements employed in respectiveembodiments of systems and interfaces described herein are omitted forsake of brevity.

Automatic enhancement component 502 is configured to automatically applyediting tools to one or more identified components of an audio track. Inparticular, automatic enhancement component 502 is configured to makeediting decisions on behalf of a user to automatically edit an audiotrack. For example, automatic enhancement component 502 can analyze anaudio track based on the various identified audio components, select oneor more editing tools to apply to one or more of the different audiocomponents and apply the one or more editing tools to the one or moredifferent audio components.

In an aspect, a user can request automatic enhancement of an audio trackby automatic enhancement component 502. For example, as seen in FIGS. 3and 4, an editing interface can include an auto-correct button 306. Inan aspect, selection of the auto-correct button 306 results inapplication of one or more editing tools selected by automaticenhancement component 502 to one or more audio components of an audiotrack selected by automatic enhancement component 502. In anotheraspect, automatic enhancement component 502 can be configured toautomatically edit an audio track in response to receipt of the audiotrack, or video comprising the audio track, by receiving component 106.

In an aspect, automatic enhancement component 502 can apply variousrules or algorithms stored in memory 116 that dictate manners forediting components of an audio track to automatically edit the audiotrack. For example, an algorithm could require audio componentsidentified as background noise to be muted, audio components identifiedas music to be adjusted to volume level 5, and audio componentsidentified as dialogue to be adjusted to volume level 8.

In another aspect, automatic enhancement component 502 can employinference component 504 to facilitate automatically enhancing or editingan audio track. Inference component 504 is configured to provide for oraid in various inferences or determinations associated with identifyingaudio components of an audio track and classifying the audio componentsfor what the components are (e.g., identifying a component as anambulance siren or background noise) by identification component 110. Inaddition, inference component 504 can facilitate inferring what audiocomponents of an audio track to apply editing tools to and what tools toapply to those audio components. For example, inference component 504can infer that a particular sound identified in an audio track is amotorcycle. Inference component 504 can further infer that themotorcycle sound should be increased to volume level 8 at frame 499 of aparticular video. In an aspect, all or portions of media editing system104 and media sharing system 102 can be operatively coupled to inferencecomponent 504.

In order to provide for or aid in the numerous inferences describedherein, inference component 504 can examine the entirety or a subset ofthe data to which it is granted access and can provide for reasoningabout or infer states of the system, environment, etc. from a set ofobservations as captured via events and/or data. An inference can beemployed to identify a specific context or action, or can generate aprobability distribution over states, for example. The inference can beprobabilistic—that is, the computation of a probability distributionover states of interest based on a consideration of data and events. Aninference can also refer to techniques employed for composinghigher-level events from a set of events and/or data.

Such an inference can result in the construction of new events oractions from a set of observed events and/or stored event data, whetheror not the events are correlated in close temporal proximity, andwhether the events and data come from one or several event and datasources. Various classification (explicitly and/or implicitly trained)schemes and/or systems (e.g., support vector machines, neural networks,expert systems, Bayesian belief networks, fuzzy logic, data fusionengines, etc.) can be employed in connection with performing automaticand/or inferred action in connection with the claimed subject matter.

A classifier can map an input attribute vector, x=(x1, x2, x3, x4, xn),to a confidence that the input belongs to a class, such as byf(x)=confidence (class). Such classification can employ a probabilisticand/or statistical-based analysis (e.g., factoring into the analysisutilities and costs) to prognose or infer an action that a user desiresto be automatically performed. A support vector machine (SVM) is anexample of a classifier that can be employed. The SVM operates byfinding a hyper-surface in the space of possible inputs, where thehyper-surface attempts to split the triggering criteria from thenon-triggering events. Intuitively, this makes the classificationcorrect for testing data that is near, but not identical to trainingdata. Other directed and undirected model classification approachesinclude, e.g., naïve Bayes, Bayesian networks, decision trees, neuralnetworks, fuzzy logic models, and probabilistic classification modelsproviding different patterns of independence can be employed.Classification as used herein also is inclusive of statisticalregression that is utilized to develop models of priority.

FIG. 6 presents a diagram of another example system 600 that facilitatesediting an audio portion of a video, in accordance with various aspectsand embodiments described herein. System 600 includes same features andfunctionalities of system 500 with the addition of matching component602. Repetitive description of like elements employed in respectiveembodiments of systems and interfaces described herein are omitted forsake of brevity.

Matching component 602 is configured to match audio components of anaudio track to same or substantially similar audio files. For example,matching component 602 can access reference media files for a pluralityof known audio files stored at or associated with media sharing system102 and/or media editing system 104. Matching component 602 can employvarious tools to identify matches between audio components and referencefiles including but not limited to, audio frequency pattern comparison,audio fingerprinting comparison, or voice to text file comparison. Inresponse to an identified match, matching component 602 can suggestreplacing the matched audio component with the reference file. Forinstance, a user may want to replace an audio component with a matchedreference file where the reference is a better quality than the audiocomponent.

For example, matching component 602 can match an audio component of anaudio track of a live performance of a song to a reference file of aprofessional studio track recording of the song. The matching component602 can further provide a user with the option to replace the liveperformance audio component with the studio version of the song. Inanother example, matching component 602 can match an audio component ofan audio track of an audience laughing with a reference file of alaughing audience and suggest replacing the audio component with thereference file.

FIG. 7 presents a diagram of another example system 700 that facilitatesediting an audio portion of a video, in accordance with various aspectsand embodiments described herein. System 700 includes same features andfunctionalities of system 600 with the addition of reproductioncomponent 702. Repetitive description of like elements employed inrespective embodiments of systems and interfaces described herein areomitted for sake of brevity.

Reproduction component 702 is configured to re-join or combine an editedaudio track with an extracted video track of a video file. Inparticular, reproduction component 402 is configured to combineextracted audio components, as edited, into a single edited audio trackand combine the edited audio track with the original extracted videotrack. With reference to FIG. 8, presented is an example process 800 forcombining an edited audio track with an extracted video track of a videofile by reproduction component 702. FIG. 8 is an extension of FIG. 4. Asseen in FIG. 8, bar 802 represents an edited audio track of a videofile. In particular, the edited audio track represented by bar 802includes the edits applied to the audio track as shown in FIG. 4. Forexample, the edited audio track represented by bar 802 includes theoriginal audio component represented by line 208, edited audio componentrepresented by line 214, no component represented by line 210 and nocomponent represented by line 212 in frames 216 and 220. Reproductioncomponent 702 combines the edited audio track represented by bar 802with the original extracted video track represented by bar 204 togenerate an edited video file represented by bar 806 that includes theoriginal video track and the edited audio track.

FIG. 9 presents a diagram of another example system 900 that facilitatesediting an audio portion of a video, in accordance with various aspectsand embodiments described herein. System 900 includes same or similarfeatures and functionalities of other systems described herein.Repetitive description of like elements employed in respectiveembodiments of systems and interfaces described herein are omitted forsake of brevity.

System 900 includes client device 902, one or more networks 118, mediasharing system 914 and media editing system 916. With system 900, mediasharing system 914 is depicted as including media editing system 916.Media sharing system 914 and media editing system 916 can include one ormore features of media sharing system 102 and media editing system 104(and vice versa). Client device 902 can include media recordingcomponent 904, media tagging component 906 and media uploading component908. Client device 902 includes memory 912 for storing computerexecutable components and instructions. Client device 902 furtherincludes a processor 910 to facilitate operation of the instructions(e.g., computer executable components and instructions) by client device902.

System 900 emphasizes features of an example client device 902 thatfacilitates recording video and/or audio and annotating audio componentsin accordance with an embodiment. As noted with respect to thediscussion of FIG. 1, in an aspect, receiving component 106 of mediaediting system 104 can receive a video file having an audio track thatis annotated to indicate different sounds or audio components present inthe audio track. The identification component 110 can employ theannotations to easily identify different audio components of the audiotrack for extraction by extraction component 108. Client device 902 isconfigured to provide such annotated video and audio files to mediasharing system 914 for editing with media editing system 916.

Media recording component 904 is configured to record video and/or audiofiles. For example, media recording component 904 can include a videocamera and one or more microphones. Media tagging component 906 isconfigured to identify and tag different audio components ofreceived/recorded audio. For example, media tagging component 906 canassociate metadata with an audio track indicating distinct audiocomponents and/or identifying what the distinct audio components are(e.g., background noise, song, dialogue, etc.). In an aspect, mediatagging component 906 is configured to tag different components of audioas the audio is received/recorded. Media uploading component 908 isconfigured to upload a tagged or annotated media file to media sharingsystem 914 via a network 118.

FIG. 10 presents a diagram of another example system 1000 thatfacilitates editing an audio portion of a video, in accordance withvarious aspects and embodiments described herein. System 1000 includessame or similar features and functionalities of system 900. Repetitivedescription of like elements employed in respective embodiments ofsystems and interfaces described herein are omitted for sake of brevity.

System 1000 demonstrates an example embodiment of a system thatfacilitates editing an audio portion of a video similar to system 900.However unlike system 900, system 1000 includes a client device 1002that includes a media editing system 1004. Media editing system 1004 caninclude one or more of the features and functionalities of media editingsystem 104. System 1000 further includes one or more networks, mediasharing system 1008 and media editing system 1010. Media sharing system1008 and media editing system 1010 can include one or more features ofmedia sharing system 102 and media editing system 104 (and vice versa).

In an aspect, media editing system 1004 includes a portion of thecomponents of media editing system 104 while media editing system 1010includes another portion of the components of media editing system 104.For example, media editing system 1004 can include extraction component108 and identification component 110 while media editing system 1010includes editing component 112. In another example, media editing system1004 can include identification component 110 while media editing system1010 includes include extraction component 108 and editing component112. In an aspect, (not shown) media sharing system 1008 does notinclude a media editing system. According to this aspect, the variousfeatures of media editing system 104 are provided at client device 1002.

In view of the example systems and/or devices described herein, examplemethods that can be implemented in accordance with the disclosed subjectmatter can be further appreciated with reference to flowcharts in FIGS.11-13. For purposes of simplicity of explanation, example methodsdisclosed herein are presented and described as a series of acts;however, it is to be understood and appreciated that the disclosedsubject matter is not limited by the order of acts, as some acts mayoccur in different orders and/or concurrently with other acts from thatshown and described herein. For example, a method disclosed herein couldalternatively be represented as a series of interrelated states orevents, such as in a state diagram. Moreover, interaction diagram(s) mayrepresent methods in accordance with the disclosed subject matter whendisparate entities enact disparate portions of the methods. Furthermore,not all illustrated acts may be required to implement a method inaccordance with the subject specification. It should be furtherappreciated that the methods disclosed throughout the subjectspecification are capable of being stored on an article of manufactureto facilitate transporting and transferring such methods to computersfor execution by a processor or for storage in a memory.

FIG. 11 illustrates a flow chart of an example method 1100 for editingan audio track of a video file, in accordance with aspects describedherein. At 1102, a video is received as an upload from a client deviceover a network (e.g., via receiving component 106). At 1104, an audiotrack and a video track are separated from the video (e.g., usingextraction component 108). At 1106, two or more different audiocomponents of the audio track are identified (e.g., using identificationcomponent 110). At 1108, the two or more different audio components areseparated (e.g., using extraction component 108). At 1110, an editinginterface is generated and at 1112, the client device is provided one ormore options for editing the two or more different audio componentsjointly or separately via the editing interface (e.g., using editingcomponent 112).

FIG. 12 illustrates a flow chart of another example method 1200 forediting an audio track of a video file, in accordance with aspectsdescribed herein. At 1202, a video is received as an upload from aclient device over a network (e.g., via receiving component 106). At1204, an audio track and a video track are separated from the video(e.g., using extraction component 108). At 1206, two or more differentaudio components of the audio track are identified (e.g., usingidentification component 110). At 1208, the two or more different audiocomponents are separated (e.g., using extraction component 108). At1210, an editing interface is generated and at 1212, the client deviceis provided one or more options for editing the two or more differentaudio components jointly or separately via the editing interface (e.g.,using editing component 112). At 1214, the one or more options forediting the two or more different audio components are applied togenerate an edited audio track comprising the two or more differentaudio components as edited (e.g., using editing component 112). At 1216,the edited audio track is combined with the video track to generate anedited video (e.g., using reproduction component 702).

FIG. 13 illustrates a flow chart of another example method 1300 forediting an audio track of a video file, in accordance with aspectsdescribed herein. At 1302, a video is received as an upload from aclient device over a network (e.g., via receiving component 106). At1304, an audio track and a video track are separated from the video(e.g., using extraction component 108). At 1306, two or more differentaudio components of the audio track are identified (e.g., usingidentification component 110). At 1308, a first editing tool is appliedto a first one of the audio components and at 3110, a second editingtool is applied to a second one of the audio components, wherein thefirst editing tool and the second editing tool are different (e.g.,using editing component 112). At 1312, the two or more different audiocomponents are combined after the applying the first editing tool andthe second editing tool to generate an edited audio track (e.g., usingreproduction component 702). At 1314, the edited audio track is joinedwith the video track to generate an edited video file (e.g., usingreproduction component 702).

Example Operating Environments

The systems and processes described below can be embodied withinhardware, such as a single integrated circuit (IC) chip, multiple ICs,an application specific integrated circuit (ASIC), or the like. Further,the order in which some or all of the process blocks appear in eachprocess should not be deemed limiting. Rather, it should be understoodthat some of the process blocks can be executed in a variety of orders,not all of which may be explicitly illustrated in this disclosure.

With reference to FIG. 14, a suitable environment 1400 for implementingvarious aspects of the claimed subject matter includes a computer 1402.The computer 1402 includes a processing unit 1404, a system memory 1406,a codec 1405, and a system bus 1408. The system bus 1408 couples systemcomponents including, but not limited to, the system memory 1406 to theprocessing unit 1404. The processing unit 1404 can be any of variousavailable processors. Dual microprocessors and other multiprocessorarchitectures also can be employed as the processing unit 1404.

The system bus 1408 can be any of several types of bus structure(s)including the memory bus or memory controller, a peripheral bus orexternal bus, and/or a local bus using any variety of available busarchitectures including, but not limited to, Industrial StandardArchitecture (ISA), Micro-Channel Architecture (MSA), Extended ISA(EISA), Intelligent Drive Electronics (IDE), VESA Local Bus (VLB),Peripheral Component Interconnect (PCI), Card Bus, Universal Serial Bus(USB), Advanced Graphics Port (AGP), Personal Computer Memory CardInternational Association bus (PCMCIA), Firewire (IEEE 13144), and SmallComputer Systems Interface (SCSI).

The system memory 1406 includes volatile memory 1410 and non-volatilememory 1412. The basic input/output system (BIOS), containing the basicroutines to transfer information between elements within the computer1402, such as during start-up, is stored in non-volatile memory 1412. Inaddition, according to present innovations, codec 1405 may include atleast one of an encoder or decoder, wherein the at least one of anencoder or decoder may consist of hardware, a combination of hardwareand software, or software. Although, codec 1405 is depicted as aseparate component, codec 1405 may be contained within non-volatilememory 1412. By way of illustration, and not limitation, non-volatilememory 1412 can include read only memory (ROM), programmable ROM (PROM),electrically programmable ROM (EPROM), electrically erasableprogrammable ROM (EEPROM), or flash memory. Volatile memory 1410includes random access memory (RAM), which acts as external cachememory. According to present aspects, the volatile memory may store thewrite operation retry logic (not shown in FIG. 14) and the like. By wayof illustration and not limitation, RAM is available in many forms suchas static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM),double data rate SDRAM (DDR SDRAM), and enhanced SDRAM (ESDRAM.

Computer 1402 may also include removable/non-removable,volatile/non-volatile computer storage medium. FIG. 14 illustrates, forexample, disk storage 1414. Disk storage 1414 includes, but is notlimited to, devices like a magnetic disk drive, solid state disk (SSD)floppy disk drive, tape drive, Jaz drive, Zip drive, LS-70 drive, flashmemory card, or memory stick. In addition, disk storage 1414 can includestorage medium separately or in combination with other storage mediumincluding, but not limited to, an optical disk drive such as a compactdisk ROM device (CD-ROM), CD recordable drive (CD-R Drive), CDrewritable drive (CD-RW Drive) or a digital versatile disk ROM drive(DVD-ROM). To facilitate connection of the disk storage devices 1414 tothe system bus 1408, a removable or non-removable interface is typicallyused, such as interface 1416.

It is to be appreciated that FIG. 14 describes software that acts as anintermediary between users and the basic computer resources described inthe suitable operating environment 1400. Such software includes anoperating system 1418. Operating system 1418, which can be stored ondisk storage 1414, acts to control and allocate resources of thecomputer system 1402. Applications 1420 take advantage of the managementof resources by operating system 1418 through program modules 1424, andprogram data 1426, such as the boot/shutdown transaction table and thelike, stored either in system memory 1406 or on disk storage 1414. It isto be appreciated that the claimed subject matter can be implementedwith various operating systems or combinations of operating systems.

A user enters commands or information into the computer 1402 throughinput device(s) 1428. Input devices 1428 include, but are not limitedto, a pointing device such as a mouse, trackball, stylus, touch pad,keyboard, microphone, joystick, game pad, satellite dish, scanner, TVtuner card, digital camera, digital video camera, web camera, and thelike. These and other input devices connect to the processing unit 1404through the system bus 1408 via interface port(s) 1430. Interfaceport(s) 1430 include, for example, a serial port, a parallel port, agame port, and a universal serial bus (USB). Output device(s) 1436 usesome of the same type of ports as input device(s). Thus, for example, aUSB port may be used to provide input to computer 1402, and to outputinformation from computer 1402 to an output device 1436. Output adapter1434 is provided to illustrate that there are some output devices 1436like monitors, speakers, and printers, among other output devices 1436,which require special adapters. The output adapters 1434 include, by wayof illustration and not limitation, video and sound cards that provide ameans of connection between the output device 1436 and the system bus1408. It should be noted that other devices and/or systems of devicesprovide both input and output capabilities such as remote computer(s)1438.

Computer 1402 can operate in a networked environment using logicalconnections to one or more remote computers, such as remote computer(s)1438. The remote computer(s) 1438 can be a personal computer, a server,a router, a network PC, a workstation, a microprocessor based appliance,a peer device, a smart phone, a tablet, or other network node, andtypically includes many of the elements described relative to computer1402. For purposes of brevity, only a memory storage device 1440 isillustrated with remote computer(s) 1438. Remote computer(s) 1438 islogically connected to computer 1402 through a network interface 1442and then connected via communication connection(s) 1444. Networkinterface 1442 encompasses wire and/or wireless communication networkssuch as local-area networks (LAN) and wide-area networks (WAN) andcellular networks. LAN technologies include Fiber Distributed DataInterface (FDDI), Copper Distributed Data Interface (CDDI), Ethernet,Token Ring and the like. WAN technologies include, but are not limitedto, point-to-point links, circuit switching networks like IntegratedServices Digital Networks (ISDN) and variations thereon, packetswitching networks, and Digital Subscriber Lines (DSL).

Communication connection(s) 1444 refers to the hardware/softwareemployed to connect the network interface 1442 to the bus 1408. Whilecommunication connection 1444 is shown for illustrative clarity insidecomputer 1402, it can also be external to computer 1402. Thehardware/software necessary for connection to the network interface 1442includes, for exemplary purposes only, internal and externaltechnologies such as, modems including regular telephone grade modems,cable modems and DSL modems, ISDN adapters, and wired and wirelessEthernet cards, hubs, and routers.

Referring now to FIG. 15, there is illustrated a schematic block diagramof a computing environment 1500 in accordance with this disclosure. Thesystem 1500 includes one or more client(s) 1502 (e.g., laptops, smartphones, PDAs, media players, computers, portable electronic devices,tablets, and the like). The client(s) 1502 can be hardware and/orsoftware (e.g., threads, processes, computing devices). The system 1500also includes one or more server(s) 1504. The server(s) 1504 can also behardware or hardware in combination with software (e.g., threads,processes, computing devices). The servers 1504 can house threads toperform transformations by employing aspects of this disclosure, forexample. One possible communication between a client 1502 and a server1504 can be in the form of a data packet transmitted between two or morecomputer processes wherein the data packet may include video data. Thedata packet can include a metadata, e.g., associated contextualinformation, for example. The system 1500 includes a communicationframework 1506 (e.g., a global communication network such as theInternet, or mobile network(s)) that can be employed to facilitatecommunications between the client(s) 1502 and the server(s) 1504.

Communications can be facilitated via a wired (including optical fiber)and/or wireless technology. The client(s) 1502 include or areoperatively connected to one or more client data store(s) 1508 that canbe employed to store information local to the client(s) 1502 (e.g.,associated contextual information). Similarly, the server(s) 1504 areoperatively include or are operatively connected to one or more serverdata store(s) 1510 that can be employed to store information local tothe servers 1504.

In one embodiment, a client 1502 can transfer an encoded file, inaccordance with the disclosed subject matter, to server 1504. Server1504 can store the file, decode the file, or transmit the file toanother client 1502. It is to be appreciated, that a client 1502 canalso transfer uncompressed file to a server 1504 and server 1504 cancompress the file in accordance with the disclosed subject matter.Likewise, server 1504 can encode video information and transmit theinformation via communication framework 1506 to one or more clients1502.

The illustrated aspects of the disclosure may also be practiced indistributed computing environments where certain tasks are performed byremote processing devices that are linked through a communicationsnetwork. In a distributed computing environment, program modules can belocated in both local and remote memory storage devices.

Moreover, it is to be appreciated that various components described inthis description can include electrical circuit(s) that can includecomponents and circuitry elements of suitable value in order toimplement the embodiments of the subject innovation(s). Furthermore, itcan be appreciated that many of the various components can beimplemented on one or more integrated circuit (IC) chips. For example,in one embodiment, a set of components can be implemented in a single ICchip. In other embodiments, one or more of respective components arefabricated or implemented on separate IC chips.

What has been described above includes examples of the embodiments ofthe present invention. It is, of course, not possible to describe everyconceivable combination of components or methodologies for purposes ofdescribing the claimed subject matter, but it is to be appreciated thatmany further combinations and permutations of the subject innovation arepossible. Accordingly, the claimed subject matter is intended to embraceall such alterations, modifications, and variations that fall within thespirit and scope of the appended claims. Moreover, the above descriptionof illustrated embodiments of the subject disclosure, including what isdescribed in the Abstract, is not intended to be exhaustive or to limitthe disclosed embodiments to the precise forms disclosed. While specificembodiments and examples are described in this disclosure forillustrative purposes, various modifications are possible that areconsidered within the scope of such embodiments and examples, as thoseskilled in the relevant art can recognize.

In particular and in regard to the various functions performed by theabove described components, devices, circuits, systems and the like, theterms used to describe such components are intended to correspond,unless otherwise indicated, to any component which performs thespecified function of the described component (e.g., a functionalequivalent), even though not structurally equivalent to the disclosedstructure, which performs the function in the disclosure illustratedexemplary aspects of the claimed subject matter. In this regard, it willalso be recognized that the innovation includes a system as well as acomputer-readable storage medium having computer-executable instructionsfor performing the acts and/or events of the various methods of theclaimed subject matter.

The aforementioned systems/circuits/modules have been described withrespect to interaction between several components/blocks. It can beappreciated that such systems/circuits and components/blocks can includethose components or specified sub-components, some of the specifiedcomponents or sub-components, and/or additional components, andaccording to various permutations and combinations of the foregoing.Sub-components can also be implemented as components communicativelycoupled to other components rather than included within parentcomponents (hierarchical). Additionally, it should be noted that one ormore components may be combined into a single component providingaggregate functionality or divided into several separate sub-components,and any one or more middle layers, such as a management layer, may beprovided to communicatively couple to such sub-components in order toprovide integrated functionality. Any components described in thisdisclosure may also interact with one or more other components notspecifically described in this disclosure but known by those of skill inthe art.

In addition, while a particular feature of the subject innovation mayhave been disclosed with respect to only one of several implementations,such feature may be combined with one or more other features of theother implementations as may be desired and advantageous for any givenor particular application. Furthermore, to the extent that the terms“includes,” “including,” “has,” “contains,” variants thereof, and othersimilar words are used in either the detailed description or the claims,these terms are intended to be inclusive in a manner similar to the term“comprising” as an open transition word without precluding anyadditional or other elements.

As used in this application, the terms “component,” “module,” “system,”or the like are generally intended to refer to a computer-relatedentity, either hardware (e.g., a circuit), a combination of hardware andsoftware, software, or an entity related to an operational machine withone or more specific functionalities. For example, a component may be,but is not limited to being, a process running on a processor (e.g.,digital signal processor), a processor, an object, an executable, athread of execution, a program, and/or a computer. By way ofillustration, both an application running on a controller and thecontroller can be a component. One or more components may reside withina process and/or thread of execution and a component may be localized onone computer and/or distributed between two or more computers. Further,a “device” can come in the form of specially designed hardware;generalized hardware made specialized by the execution of softwarethereon that enables the hardware to perform specific function; softwarestored on a computer readable storage medium; software transmitted on acomputer readable transmission medium; or a combination thereof.

Moreover, the words “example” or “exemplary” are used in this disclosureto mean serving as an example, instance, or illustration. Any aspect ordesign described in this disclosure as “exemplary” is not necessarily tobe construed as preferred or advantageous over other aspects or designs.Rather, use of the words “example” or “exemplary” is intended to presentconcepts in a concrete fashion. As used in this application, the term“or” is intended to mean an inclusive “or” rather than an exclusive“or”. That is, unless specified otherwise, or clear from context, “Xemploys A or B” is intended to mean any of the natural inclusivepermutations. That is, if X employs A; X employs B; or X employs both Aand B, then “X employs A or B” is satisfied under any of the foregoinginstances. In addition, the articles “a” and “an” as used in thisapplication and the appended claims should generally be construed tomean “one or more” unless specified otherwise or clear from context tobe directed to a singular form.

Computing devices typically include a variety of media, which caninclude computer-readable storage media and/or communications media, inwhich these two terms are used in this description differently from oneanother as follows. Computer-readable storage media can be any availablestorage media that can be accessed by the computer, is typically of anon-transitory nature, and can include both volatile and nonvolatilemedia, removable and non-removable media. By way of example, and notlimitation, computer-readable storage media can be implemented inconnection with any method or technology for storage of information suchas computer-readable instructions, program modules, structured data, orunstructured data. Computer-readable storage media can include, but arenot limited to, RAM, ROM, EEPROM, flash memory or other memorytechnology, CD-ROM, digital versatile disk (DVD) or other optical diskstorage, magnetic cassettes, magnetic tape, magnetic disk storage orother magnetic storage devices, or other tangible and/or non-transitorymedia which can be used to store desired information. Computer-readablestorage media can be accessed by one or more local or remote computingdevices, e.g., via access requests, queries or other data retrievalprotocols, for a variety of operations with respect to the informationstored by the medium.

On the other hand, communications media typically embodycomputer-readable instructions, data structures, program modules orother structured or unstructured data in a data signal that can betransitory such as a modulated data signal, e.g., a carrier wave orother transport mechanism, and includes any information delivery ortransport media. The term “modulated data signal” or signals refers to asignal that has one or more of its characteristics set or changed insuch a manner as to encode information in one or more signals. By way ofexample, and not limitation, communication media include wired media,such as a wired network or direct-wired connection, and wireless mediasuch as acoustic, RF, infrared and other wireless media.

In view of the exemplary systems described above, methodologies that maybe implemented in accordance with the described subject matter will bebetter appreciated with reference to the flowcharts of the variousfigures. For simplicity of explanation, the methodologies are depictedand described as a series of acts. However, acts in accordance with thisdisclosure can occur in various orders and/or concurrently, and withother acts not presented and described in this disclosure. Furthermore,not all illustrated acts may be required to implement the methodologiesin accordance with certain aspects of this disclosure. In addition,those skilled in the art will understand and appreciate that themethodologies could alternatively be represented as a series ofinterrelated states via a state diagram or events. Additionally, itshould be appreciated that the methodologies disclosed in thisdisclosure are capable of being stored on an article of manufacture tofacilitate transporting and transferring such methodologies to computingdevices. The term article of manufacture, as used in this disclosure, isintended to encompass a computer program accessible from anycomputer-readable device or storage media.

What is claimed is:
 1. A system, comprising: a memory having storedthereon computer executable components; a processor that executes atleast the following computer executable components: a receivingcomponent configured to receive a video as an upload to a website from aclient device over a network; an identification component configured to:analyze audio frequencies of an audio track of the video; identifypatterns in the audio frequencies; identify two or more different andconcurrent audio layers of the audio track based on the patterns; andidentify at least one of the audio layers as a dialogue audio layerbased on the identified patterns including at least one patterncorresponding to one or more voices within the audio track; anextraction component configured to extract and separate the audiolayers; an editing component configured to: generate an editinginterface on the website, the interface including a set of editingoptions and a representation of each of the audio layers; receive, viathe editing interface, input from the client device over the networkregarding editing the audio layers separately, the input including aselection of at least one of the editing options and at least one of therepresentations of the audio layers; edit the selected audio layersbased on the selected editing options; and generate an edited audiotrack comprising the audio layers as edited; and a reproductioncomponent configured to combine the edited audio track with an extractedvideo track of the video to generate an edited video to post on thewebsite.
 2. The system of claim 1, wherein the system is located at aserver device accessible to one or more client devices via the network.3. The system of claim 1, wherein the extraction component is furtherconfigured to separate the audio track from a video track of the video.4. The system of claim 1, wherein the input regarding editing the two ormore different and concurrent audio layers includes at least one of, arequest to modify volume, a request to mute, a request to add a soundeffect, a request to remove a sound effect, or a request to changepitch.
 5. The system of claim 1, wherein the input regarding editing thetwo or more different and concurrent audio layers includes a request toapply a first editing option to a first one of the two or more differentand concurrent audio layers and a request to apply a second editingoption to a second one of the two or more different and concurrent audiolayers, wherein the first one of the two or more different andconcurrent audio layers includes the dialogue audio layer and the firstediting option and the second editing option are different.
 6. Thesystem of claim 1, wherein the audio track comprises a plurality ofsequential segments respectively associated with sequential frames ofthe video, wherein the identification component is configured toidentify two or more different and concurrent audio layers respectivelyassociated with respective segments of the sequential segments.
 7. Thesystem of claim 1, further comprising an inference component configuredto analyze the two or more different and concurrent audio layers anddetermine or infer an editing option to apply to at least one of the twoor more different and concurrent audio layers.
 8. The system of claim 1,wherein the two or more different and concurrent audio layers span alongan entirety of the audio track.
 9. The system of claim 1, wherein therepresentations of each of the audio layers are presented within arespective frame of a set of layered frames.
 10. The system of claim 1,wherein the identification component identifies a set of audio layersnot including the dialogue audio layer as background noise.
 11. Thesystem of claim 1, further comprising an automatic enhancement componentconfigured to automatically edit the audio layers by increasing a volumeof the dialogue audio layer and decreasing or muting a volume of aremaining set of audio layers, wherein the extracted audio layers areautomatically edited in response to the selected editing optionincluding a selection that corresponds to a dialogue enhancement option.12. The system of claim 1, further comprising a matching componentconfigured to match one of the audio layers with a reference file,wherein the set of editing options includes an option to replace thematched audio layer with the reference file.
 13. The system of claim 12,wherein the editing component replaces the matched audio layer with thereference file in response to the selected editing option including anoption to replace the matched audio layer with the reference file. 14.The system of claim 13, wherein the reference file includes a musictrack.
 15. The system of claim 1, wherein the identification componentis configured to identify patterns in the audio frequencies byreferencing a look-up table storing patterns corresponding to previouslyidentified sounds.
 16. The system of claim 1, wherein the identificationcomponent is configured to identify patterns in the audio frequencies byemploying a voice to text recognition to covert spoken language into atext file to identify the dialogue audio layer.
 17. The system of claim1, further comprising a media tagging component configured to associatemetadata with each of the identified audio layers.
 18. A methodcomprising: using a processor to execute the following computerexecutable instructions stored in a memory to perform the followingacts: receiving a video as an upload from a client device over anetwork; analyzing audio frequencies of an audio track of the video;identifying patterns in the audio frequencies; identifying two or moredifferent and concurrent audio layers of the audio track based on thepatterns; identifying at least one of the audio layers as a dialogueaudio layer based on the identified patterns including at least onepattern corresponding to one or more voices within the audio track;separating the two or more different and concurrent audio layers;generating an editing interface on a website, the interface including aset of editing options and a representation of each of the two or moredifferent and concurrent audio layers; receiving input from the clientdevice over the network regarding editing the two or more different andconcurrent audio layers separately via the editing interface; editingthe two or more different and concurrent audio layers based on theinput; and generating an edited audio track comprising the two or moredifferent and concurrent audio layers as edited.
 19. The method of claim18, wherein the receiving the input comprises at receiving at least oneof, a request to modify volume, a request to mute, a request to add asound effect, a request to remove a sound effect, or a request to changepitch.
 20. The method of claim 18, further comprising combining theedited audio track with an extracted video track of the video togenerate an edited video.
 21. The method of claim 18, wherein thereceiving the input comprises receiving a request to apply a firstediting option to a first one of the two or more different andconcurrent audio layers and a second editing option to a second one ofthe two or more different and concurrent audio layers, wherein the firstone of the two or more different and concurrent audio layers includesthe dialogue audio layer and the first editing option and the secondediting option are different.
 22. The method of claim 18, furthercomprising analyzing the two or more different audio components anddetermining or inferring an editing option to apply to at least one ofthe two or more different and concurrent audio layers.
 23. Anon-transitory computer-readable storage storing computer-readableinstructions that, in response to execution, cause a computing system toperform operations, comprising: receiving a video as an upload from aclient device over a network; analyzing audio frequencies of an audiotrack of the video; identifying patterns in the audio frequencies;identifying two or more different and concurrent audio layers of theaudio track based on the patterns; identifying at least one of the audiolayers as a dialogue audio layer based on the identified patternsincluding at least one pattern corresponding to one or more voiceswithin the audio track; separating the two or more different andconcurrent audio layers; generating an editing interface on a website;receiving a request from the client device over the network to apply anediting option to a subset of the two or more different and concurrentaudio layers via the editing interface; applying the editing option toonly the subset of the two or more different and concurrent audio layersin response to the request; generating an edited audio track comprisingthe subset of the two or more different and concurrent audio layers inresponse to the applied editing option; and combining the edited audiotrack with an extracted video track of the video to generate an editedvideo.
 24. The non-transitory computer-readable storage of claim 23,wherein the editing option includes at least one of, an option to modifyvolume, an option to mute, an option to add a sound effect, an option toremove a sound effect, or an option to change pitch.